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• Premise of the study: We present an alternative approach for molecular systematic studies that combines long PCR and next- 
generation sequencing. Our approach can be used to generate templates from any DNA source for next-generation sequencing. 
Here we test our approach by amplifying complete chloroplast genomes, and we present a set of 58 potentially universal prim- 
ers for angiosperms to do so. Additionally, this approach is likely to be particularly useful for nuclear and mitochondrial 
regions. 

• Methods and Results: Chloroplast genomes of 30 species across angiosperms were amplified to test our approach. Amplifica- 
tion success varied depending on whether PCR conditions were optimized for a given taxon. To further test our approach, some 
amplicons were sequenced on an Illumina HiSeq 2000. 

• Conclusions: Although here we tested this approach by sequencing plastomes, long PCR amplicons could be generated using 
DNA from any genome, expanding the possibilities of this approach for molecular systematic studies. 

Key words: angiosperms; chloroplast enrichment; long PCR; next-generation sequencing; plastome; universal chloroplast 
PCR primers. 



Advancements in next-generation sequencing (NGS) tech- 
nologies have permitted the assembly of large, genome-scale 
data sets that have shed light on the evolutionary history of 
many taxa (e.g., Parks et al., 2009; Moore et al., 2010; Xi et al., 
2012; Eaton and Ree, 2013; Tennessen et al., 2013). For plant 
phylogenetics, there has been a major focus on methods for 
chloroplast phylogenomics (e.g., Parks et al., 2009; Moore et al., 
2010), although methods for collecting phylogenomic data sets 
from the nuclear and mitochondrial genomes have also been 
developed (e.g., Straub et al., 2012; Eaton and Ree, 2013). Stull 
et al. (2013) developed a custom RNA probe set designed to 
capture angiosperm plastomes via solution-based hybridization. 
While their capture system was broadly successful, Stull et al. 
(2013) found that the most variable spacer regions were often 
captured at much-reduced coverage compared to more con- 
served regions, and were sometimes missed entirely if the target 
taxon was phylogenetically divergent from one of the 22 plas- 
tomes used in the bait design. Moreover, the current cost of the 
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capture probes makes this method most efficient for projects 
dealing with hundreds of species. Another commonly employed 
method for plant phylogenomic studies is genome skimming 
(Straub et al., 2012), which takes advantage of the fact that or- 
ganellar DNA and nuclear ribosomal DNA are present at high 
copy numbers in genomic DNA. However, a significant limita- 
tion of this method for systematic studies is that only high-copy 
number regions are recovered consistently across all samples, 
whereas regions with lower representation are only recovered 
in some samples and missed completely in others (Straub et al., 
2011). This can be problematic for molecular systematic stud- 
ies where missing data may result in misleading phylogenetic 
results (Lemmon et al., 2009). Moreover, being limited to high- 
copy regions in the genome becomes restrictive for experimen- 
tal design as it excludes putatively highly informative regions 
in the genome such as single-copy nuclear genes (e.g., the 
single-copy orfhologous genes [COSH] and the pentatricopep- 
tide repeat [PPR] gene family; Wu et al., 2006, and Yuan et al., 
2009, respectively). 

As an alternative, we present an NGS approach that com- 
bines long PCR and Illumina sequencing to strategically com- 
pile phylogenomic data sets for molecular systematic studies. 
Long PCR, or long-range PCR, uses a combination of two poly- 
merases — a nonproofreading polymerase at high concentration 
and a proofreading polymerase at a lower concentration — to 
amplify DNA fragments that range between 3 and 15 kilobases 
(kb), although cases of extremely large fragments (22^-2 kb) 
have been reported (e.g., Cheng et al., 1994). Long PCR has 
been used extensively in human genome projects (e.g., Craig 
et al., 2008) and to sequence complete mitochondrial genomes 
(e.g., Rnaus et al., 2011; Alexander et al., 2013), using both 
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Sanger sequencing and NGS technologies. Here, we use long 
PCR to generate chloroplast DNA templates for systematic 
studies using NGS. While we focus on whole chloroplast am- 
plification, this approach is directly translatable to targeted 
studies where only particular regions of the plastome are of in- 
terest (e.g., the inverted repeat or the small single-copy region). 
In addition, long PCR could also be very useful for the enrich- 
ment of mitochondrial and/or nuclear regions where intron sizes 
are large or unknown, as well as for regions that are difficult to 
assemble bioinformatically, such as repetitive regions. 

Our focus on the chloroplast genome is driven by its phyloge- 
netic informativeness at essentially all taxonomic scales and its 
relative ease of amplification (e.g., Downie and Palmer, 1992; 
Graham and Olmstead, 2000; Moore et al., 2007; Parks et al., 
2009; Moore et al., 2010), which have made the chloroplast the 
workhorse of molecular plant systematics since the beginning 
of the field. Moreover, the availability of a large number of 
angiosperm plastome sequences had facilitated the design of 
potentially universal PCR primers. To test this approach, we 
amplified the chloroplast genomes of 30 species (17 genera) 
across angiosperms using a set of 58 chloroplast PCR primers 
that were designed to potentially be universal in angiosperms 
and that may work in some gymnosperm lineages. 

METHODS AND RESULTS 

Representatives of 17 different genera (30 spp.) spanning 12 orders of an- 
giosperms sensu APG III (Angiosperm Phylogeny Group, 2009) were chosen 
to test this approach (Table 1). Special focus was given to three genera in 
Orobanchaceae: Lamourouxia Kunth (one species), Bartsia L. (two species), 
and Castilleja Mutis ex L. f. (12 species). High-quality genomic DNA was ex- 
tracted from ca. 0.02 g of silica gel-dried or herbarium tissue using a modified 
2x cetyltrimethylammonium bromide (CTAB) method (Doyle and Doyle, 
1987), yielding 30-70 ng/pL of DNA per sample. Using the 83 plastid gene 
angiosperm alignments of Moore et al. (2010; Appendix SI), we developed 58 
primers with a goal of maximizing universality across angiosperms (Table 2). 
Conserved regions for primer design were identified by eye, and the primers 
were tested with IDT OligoAnalyzer tools (Integrated DNA Technologies, Cor- 
alville, Iowa, USA) to ensure that melting temperatures (T m ) were greater than 
50°C and that there were no significant hairpins or self-dimerization problems. 
From these, 16 overlapping primer combinations were chosen to amplify the 
entire chloroplast genome in appropriately sized, overlapping fragments, mak- 
ing sure to allow at least 100 bp of overlap between regions (Fig. 1, Table 2) to 
minimize the decrease in sequencing depth usually associated with the ~30 bp 
immediately adjacent to the primer sites (Cronn et al., 2008; Harismendy and 
Frazer, 2009; Cronn et al., 2012). 

PCRs were performed using a combination of two high-quality Taq poly- 
merases— QIAGEN Taq DNA Polymerase (5 units/pL) and QIAGEN HotStar 
HiFidelity DNA Polymerase (2.5 units/pL) (QIAGEN, Valencia, California, 
USA) — to obtain amplification of fragments between 5 kb and 12 kb. The 
QIAGEN HotStar HiFidelity DNA Polymerase was diluted to 0.2 units/uL by 
combining 0.1 pL of 5x QIAGEN HotStar HiFidelity PCR buffer, 0.36 pL of 
double-deionized water (ddH 2 0), and 0.04 pL of QIAGEN HotStar HiFidelity 
DNA Polymerase (2.5 units/pL). Each PCR had a total volume of 25 pL, was 
prepared on ice, and contained the following reagents: 2.5 pL of lOx PCR buf- 
fer (QIAGEN CoralLoad or colorless, with 15 mM MgCL), 1.0 pL MgCL 
(QIAGEN 25 mM), 0.75 pL of deoxyribonucleotide triphosphates (dNTPs, 
each at 10 mM), 5.0 pL of 5x QIAGEN Q solution, 2.5 pL of both forward and 
reverse primers (each at 5 pM), 0.25 pL (1.25 units) of QIAGEN Taq DNA 
Polymerase, 0.5 pL of the diluted QIAGEN HotStar HiFidelity DNA Poly- 
merase solution, 9 pL of ddH 2 0, and 1 .0 pL of DNA template. Long PCR pro- 
files were as follows: preheat at 93°C, initial denaturation at 93°C for 3 min 
followed by 35 cycles of denaturation at 93°C for 15 s, annealing at 48-68°C 
(depending on the primer pair) for 30 s, and extension at 68°C for 5-12 min 
( 1 min/kb of target). To assess amplification, 2 pL of the final reactions were 
examined on a 1% agarose gel with appropriate size standards and the final 
products were kept at 4°C. The complete, step-by-step long PCR protocol can 
be found in Appendix 1 . 
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Table 2. Universal angiosperm primers used for chloroplast genome amplifications. The 16 primer combinations chosen for this study are in bold with 
approximate amplicon sizes in kilobases (kb) indicated." 



Region no. 


Approx. size (kb) 


Primer (F/R) 


Primer sequence (5'-3') 


Overlap between regions in bp b 


1 


8 


trnH.GUG.6R 


CCTTRATCCACTTGGCTACAT 


Regions 1 & 2 = 542 


1 




psbK.195R 


ACTTACAGCAGCTTGCCAAAC 


Regions 1 & 2a = 542 


2/2a 


10.3/6.3 


trnQ.UUG.50R 


GGACGGAAGGATTCGAACC 


Regions 2a & 2b = 627 


2a 




atpH.17F 


CTGCYGCTTCYGTTATTGCT 


Regions 2b & 3 = 2059 


2b 


4 


atpF.65R 


CGGTATTAAACCCGAAACTCC 


Regions 2 & 3 = 2059 


2/2b 




rpoC2.4805F 


GYCGTATYGATTGGTTRAAAGG 


Regions 3 & 4 = 1274 


3 


7 


atpI.705R 


CRGCTAAAGTTGCAAAAATAAGAGCT 


Regions 4 & 5 = 860 


3 




rpoC1.1670F 


GRGATCAAATGGCTGTTCAT 


Regions 5 & 6 = 618 


4 


9 


rpoC2.520R 


GTTCGTACAGCAGTATCYACAAC 


Regions 6 & 7 = 764 


4 




petN.3R 


GCCCAAGCRAGACTTACTATATCC 


Regions 7 & 8 = 153 


5 


10.5 


trnC.GCA.47F 


CCCAGTTCAAATCCGGGT 


Regions 8 & 9 = 1216 


5 




psaB.2170F 


GCRGCTTTCTTGATTGCYTC 


Regions 9 & 10 = 135 


6 


10 


trnfM.CAU.21R 


GGTTATGAGCCTTGCGAGCTA 


Regions 10 & 1 1 = 771 


6 




trnT.UGU.17F 


GGTTAGAGCATCGCATTTGTAATG 


Regions 11 & 12 = 2781 


7 


10.3 


rps4.380R 


GGTTTGCARCGATAACTTGGKATATC 


Regions 12 & 13 = 142 


7 




rbcL.178R 


GTCCATGTACCAGTAGARGATTC 


Regions 13 & 14 = 392 


8 


9.2 


rbcL.2F 


T GT C AC C AC AAAC AGARAC TAAAG 


Regions 14 & 15 = 1911 


8 




psbJ.3F 


GGCYGATACTACTGGAAGRAT 


Regions 16 & 1 = 840 


9 


9.8 


petA.920F 


CTTCAAGAYCCATTACGTGTHCAAG 




9 




psbB.160R 


TRCCYTGTCTCCACATTGGAT 




10 


10.9 


psbB.3F 


GGGTTTRCCTTGGTATCGTGT 




10 




rps3.17F.new 


ATCCACTTGGTTTYMGACTTGG 




1 1 


8.7 


rpll6.3R 


AACCAACGAGTCACACACTAAGC 




11/16 




ycf2.5100R 


C AGAT C AT GAAT GT T TGGAAT C C AT 




12 


10 


ycf2.2300F 


TCGGGATCCTRATGCATATAGATAC 




12 




rpsl2.190F 


GTTGCCAGAGTACGMTTAACCT 




13 


1 1 


rpsl2.360R 


CCCTTGTTGACGATCCTTTACTC 




13 




ycfl.59R 


CCGACCACAACGACC GAAT 




14/15 


11.2 


trnN.GUU.7R 


CCGCTCTACCACTGAGCTAC 




14 




ndhA.535F 


GCTGCTCAATCDATTAGTTATGAA 




15 


10.5 


ndhI.194R 


C GAACRC AT AC T T C AC AAGC AA 




16 


8.2 


psbA.640F 


GCTATGCATGGTTCYTTGGTAAC 








rpsl6.50R 


C GAAC AT C AAT T GC AAC GAT T C GAT A 








rpsl6.50F 


TATCGAATCGTTGCAATTGATGTTCG 








psbK.200F 


GGCAAGCTGCTGTAAGTTTTCGA 








atpFJOF 


GGGTTTAATACCGATATTTTAGCAAC 








trnR.UCU.45F 


GGTATAGGTTCAAATCCTATTGGAC 








trnQ.UUG.47F 


CGGAGGTTCGAATCCTTCC 








trnK.UUU.3R 


GAGATGGCAACTCAATCGTTG 








trnK.UUU.3F 


CAACGATTGAGTTGCCATCTC 








atpA.43()F 


CGTTCYGTATATGARCCTCTTCAAAC 








atpA.820F 


ATCGMCAAATGTCTCTTCTATTAMG 








ccsA.890R 


TCCAAGTAATAAANGCCCAAGTTTC 








trnR. ACG. 1 5F 


GAGGATTAGAGCACGTGG 








ycfl.70F 


GTGGTCGGACTCTATTATGGAT 








trnL.UAG.18F 


GGTAGACACGCTGCTCTTAGG 








trnL.UAG.19F 


GTAGACACGCTGCTCTTAGGAAG 








rpsl2.320R 


GGGTTCCTCGAACAATGTGATATC 








ipl2.550F 


GTGCTGTAGCGAAACTGATTG 








rpl2.640F 


TCAGCAACAGTCGGACARGT 








psbT.3F 


TGGAAGCATTGGTTTATACATTYCT 








atpB.1290R 


ARGGTTGTGATAAGAAACGYTCAA 








trnT.UGU.42F 


GATGGTCATCGGTTCGATTC 








psbC.3R 


AGTTCCATTAAAGAGCGTTTCC 








psbD.860F 


CYGGTTTATGGATGAGYGCT 








rpoB.900R 


CGTCGACCAATCYTTCCTAATTC 








rpoB.470R 


CCRGGRCTTTGCAATATTTGATTG 








rpoC2.430R 


ATRGGTAAATCAATCATTTGYCCTTG 





a All primers are shown in the 5' to 3' direction; the name of each primer consists of three parts: the gene in which the primer is anchored, the approximate 
position of the primer within that gene, and either an "F" or an "R." It is important to note that the F and R designations do not indicate that the primer 
should be used as a forward or reverse primer; rather, they indicate the 5' to 3' orientation of the primer with respect to the gene — i.e., a primer that is 
designated as an "F" primer has its 5' to 3' orientation in the same orientation as the gene (i.e., on the forward strand), whereas an "R" primer is oriented 
in the direction opposite to the 5' to 3' orientation of the gene (i.e., on the reverse strand). 

b Overlap between regions is given in number of base pairs (bp), without taking the length of the primers into consideration. 

For the three genera of Orobanchaceae in which PCR optimization was amplify were regions 2 (trnQ} VVG) -rpoC2), 9 (petA-psbB), 10 (psbB-rps3), 
performed, amplification of the fragments was straightforward and had an and 14 (trnN iGVV) -ndhA), which are among the largest fragments (10.3 kb, 
average success rate of 89.7% (range = 73-100%). The most difficult regions to 9.8 kb, 10.9 kb, and 11.2 kb, respectively; Table 2). It was possible to split 
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4UnN-GUUTR 
tldhAS35F 



■ tRNA ■ ATP synthase ■ Cytochrome b6/f ■ NADH dehydrogenase ■ Photosystem 
■ Ribosomal protein ■ rRNA RNA polymerase ■ ycf ■ Other 



Fig. 1. The final annotated chloroplast genome assembly of Bartsia inaequalis with the 16 overlapping primer combinations indicated. Note that the 
primer combinations for regions 11, 12, 13, and 16 amplify both inverted repeat A and B in a single reaction. Photos by Simon Uribe-Convers. 



region 2 into two smaller fragments, 2a (trnQ <uua, -atpH: 6.3 kb) and 2b 
(atpF-rpoC2: 4 kb), which facilitated its amplification in several taxa. This 
was not the case for regions 9, 10, and 14, for which multiple long PGR 
experiments using varying amounts of DNA template were necessary to ob- 
tain successful amplifications. Amplification outside of Orobanchaceae was 
highly variable, with an average success rate of 70.8% (range = 22-100%) 
with regions 5, 6, 9, 10, and 11 showing the lowest success. Importantly, 
the results for these taxa were obtained after just two rounds of PCR where the 
annealing temperatures were changed to either 48°C or 55°C. Although we 
did not optimize the long PCRs for each group, we are confident that opti- 
mization on a per group basis (e.g., increasing template volume, altering 
annealing temperatures, and/or long PCR profiles) and/or the use of fresh 



tissue for DNA extractions would improve success rates. Furthermore, if 
genomic rearrangements and/or primer mismatches are present in certain 
groups, primer combinations other than the 16 that were used here could be 
tested (Table 2). Nevertheless, we successfully amplified all 16 regions in 
seven species, whereas in the remaining 23 species it was only possible to 
amplify between six (1 sp.) and 15 (8 spp.) regions (Table 1). These results 
translate to 21 species having at least 12 regions amplified (114.7 kb based 
on potential amplicon size), representing ca. 74% of the chloroplast genome 
when considering only one copy of the inverted repeat. Even the species 
with the smallest number of amplified fragments (Castilleja arvensis Cham. 
& Schltdl.) was represented by ~73 kb of data, exemplifying the effective- 
ness of this approach. 
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It is notable that many of the DNAs that were tested were extracted from 
herbarium tissues that ranged from five to 25 yr old when isolated. In addition, 
we tested these primers in several species of Abies Mill. (Pinaceae; Table 1) 
with surprising success, amplifying between six and nine regions without any 
PCR optimization. We caution that our long PCR protocol works best using 
recent DNA extractions that have not been through multiple freeze-thaw cycles. 
Ideally, long PCR should be conducted using new DNA extractions that are 
stored at 4°C while performing experiments. Additionally, discrete PCR bands 
were only obtained using high-quality Taq polymerases. When conventional 
polymerases were used (e.g., GoTaq [Promega Corporation, Madison, Wisconsin, 
USA] or TopTaq [QIAGEN]), the resulting PCR products were smears rather 
than discrete bands and were not used for sequencing. 

To confirm that our long PCR approach was compatible with NGS and that 
our primers would yield complete chloroplast genomes, the amplicons from 
each of the 15 Orobanchaceae taxa were purified by precipitation in a 20% 
polyethylene glycol 8000 (PEG)/2.5 M NaCl solution and washed in 70% etha- 
nol. The amplicons were sheared by nebulization at 30 psi for 70 s, yielding an 
average shear size of 500 bp as measured by a Bioanalyzer High-Sensitivity 
Chip (Agilent Technologies, Santa Clara, California, USA). DNA normaliza- 
tion is a critical step when pooling samples for multiplexing in NGS; however, 
due to the large number of plastomes per cell and the very few samples that 
were being sequenced in such a high-throughput sequencing platform, no DNA 
quantification was made and the sheared amplicons were pooled by species at 
equal volume ratios. Sequencing libraries were constructed using the Illumina 
TruSeq library preparation kit and protocol (Illumina, San Diego, California, 
USA) and were standardized at 2 nM prior to sequencing. Library concentra- 
tions were determined using the KAPA qPCR kit (KK4835; Kapa Biosystems, 
Woburn, Massachusetts, USA) on an ABI StepOnePlus Real-Time PCR System 
(Life Technologies, Grand Island, New York, USA). The resulting libraries 
were multiplexed in one Illumina HiSeq 2000 lane (-187.5 million reads per 
lane [Glenn, 201 1]) at the Vincent J. Coates Genomics Sequencing Laboratory 
at the University of California, Berkeley, yielding —12.5 million 100-bp single- 
end reads for each taxon (GenBank Sequence Read Archive accessions: 
SRR1023085, SRR1023089, SRR1023095, SRR1023112, SRR1023113, 
SRR1023126, SRR1023128-SRR1023136). Average depth of coverage of our 
sequencing experiment was ~8333x (taking 150 kb as the average plastome 
size). The results obtained here clearly do not maximize the potential of the Il- 
lumina HiSeq 2000 for plastome sequencing. To take full advantage of the large 
amount of data produced by a HiSeq 2000 for plastome sequencing, it would be 
theoretically possible to sequence -4170 samples per lane and still reach the 
30x minimum threshold generally regarded as ideal for plastome sequencing 
(Straub et al., 2012). However, high-level multiplexing in NGS with this or 
any other high-throughput method requires careful normalization of DNA con- 
centrations across samples and sufficient adapter barcodes; commonly used 
commercial kits currently offer either 96 (NEXTflex DNA Barcode kit; Bioo 
Scientific, Austin, Texas, USA) or 386 (Fluidigm, San Francisco, California, 
USA). Alternatively, one could choose to perform this type of experiment on 
an NGS platform that yielded a lesser amount of data, e.g., 1 million 250-bp 
paired-end reads on an Illumina MiSeq Reagent Nano Kit version 2, which 
would allow a 30x sequencing depth for 96 samples (or 50x sequencing depth 
for 64 samples). 

Because of the high depth of coverage of our sequencing experiment, reads 
were cleaned at high stringency (minimum quality = 30/40, maximum number 
of low-quality bases per read = 5, maximum number of duplicate reads = 10, 
minimum number of duplicate reads = 2) and assembled against a reference 
genome (Sesamum indicum L., GenBank accession no. JN637766) using the Align- 
reads pipeline version 2.25 (Straub et al., 201 1) with the following options: per- 
cent identity = medium, minimum coverage depth = 5, and single nucleotide 
polymorphism (SNP) minimum coverage depth = 25 with 80% of those reads 
supporting the SNP. The resulting assemblies had an average depth of ~700x, an 
average of 0.79% bases that were masked for not reaching the minimum sequenc- 
ing depth of 5x, and an average N50 of 35,053 bp (Table 1 ; contigs and ACE files 
deposited in the Dryad Digital Repository: http://doi.org/10.5061/dryad.kc75n; 
Uribe-Convers et al., 2014). We noticed a small decrease in sequencing depth in 
regions immediately adjacent to some primer sites, which is a phenomenon that 
has been reported in the past (Whittall et al., 2010; Knaus et al., 201 1; reviewed 
in Cronn et al., 2012). Given that our shortest overlap between amplicons is 
135 bp (between regions 9 and 10; Table 2), with the rest spanning hundreds of 
base pairs (Table 2), and that our experiment yielded a high sequencing depth, we 
had no problems calling bases unambiguously (99.99% on average, Table 1). 
The Bartsia inaequalis Benth. assembly (Fig. 1; GenBank accession no. 
KF922718) was annotated using DOGMA (Wyman et al., 2004) and visualized 
in Genome Vx (Conant and Wolfe, 2008). 



CONCLUSIONS 

We present an alternative approach for systematic studies 
that combines long PCR and NGS to strategically compile phy- 
logenomic data sets for molecular systematic studies. This ap- 
proach is on par with genome skimming in terms of costs, but it 
has the advantage of being a targeted approach and has the po- 
tential to produce data more uniformly across samples, i.e., 
minimizing missing data across taxa. Although this approach 
was only tested with chloroplast data, we emphasize that the 
long PCR amplicons can be generated using DNA from any 
genome, expanding the possibilities of long PCR and NGS for 
molecular systematic studies. This last point is important for 
studies targeting the mitochondrion or low-copy regions of the 
genome that otherwise might be missed or not shared across all 
samples using genome skimming approaches. For example, this 
approach may be particularly useful for the enrichment of nuclear 
regions, where intron sizes are large or unknown. 
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Appendix 1. Protocol for long PCR for amplification of 4-20-kb targets. Developed by the Tank Laboratory, University of Idaho; published January 2014. 



Product 



Contents 



Catalog no. 



QIAGEN Taq DNA Polymerase 1 

QIAGEN HotStar HiFidelity DNA Polymerase 2 



250 units Taq DNA Polymerase, lOx PCR Buffer, * 5x Q-Solution, 
25 mM MgCl 2 

100 units HotStar HiFidelity DNA Polymerase, 2 lOx HotStar PCR 
Buffer, 5x Q-solution, 25 mM MgSQ 4 



201205 



202602 



1 Almost any high-quality Taq polymerase should work; however, cheap Taq polymerases (e.g., QIAGEN lopTaq or Promega GoTaq) do not work and 
result in large smears, rather than discrete bands. 

2 QIAGEN HotStar HiFidelity DNA Polymerase was the only high-fidelity polymerase used in this study. 

+ Q-solution does seem to be an important additive, thus the use of QIAGEN Taq. However, this does work using Q-solution with other high-quality 
Taq polymerases such as Promega's or New England Biolab's standard Taq (i.e., if you have a stock of Q-solution, but no QIAGEN Taq). 

Genomic DNA must be high quality. Run a 0.8% or 1 % gel to check. Standard CTAB extractions from silica gel-dried or herbarium material work well if they (1) 
are recent (extraction and tissue), and (2) contain high-molecular-weight DNA. Most important, we have found that recent DNA extractions that have not been 
through numerous freeze-thaw cycles work best. For best results, long PCR should be done using new DNA extractions stored at 4°C while performing 
long PCR experiments. 



All preparations should be done on ice. 

1. Number tubes or prepare plate. Make sure to include appropriate negative controls. 

2. Prepare QIAGEN HotStar HiFidelity DNA polymerase dilution: 



Reagents to prepare the HotStar Taq dilution 


Volumes for 25 reactions (total 12.5 uL) 


Volumes for 100 reactions 
Volumes for 50 reactions (total 25 pL) (total 50 pL) 


5x HotStar HiFidelity PCR buffer 
H 2 0 

QIAGEN HotStar Taq 


2.5 pL 
9.0 pL 
1.0 uX 


5.0 uL 10 pL 
18 pL 36 uL 
2.0 pL 4.0 uL 


3. Prepare cocktail: 


Cocktail 




xl (25 pL reaction) 



lOx PCR buffer (QIAGEN CoralLoad PCR Buffer or colorless, 15 mM MgClj) 

MgCl 2 (25 mM) 

dNTP (10 mM each) 

Q solution (5x) 

5' primer (5 pM) 

3' primer (5 pM) 

Taq DNA polymerase (QIAGEN) 

QIAGEN HotStar DNA polymerase (diluted) 

H,0 



2.5 pL 

1.0 pL (3 mM final cone; adjustable) 
0.75 pL (3 pL of 2.5 mM each) 
5.0 uX 

2.5 pL (0.5 pM final cone.) 
2.5 pL (0.5 pM final cone.) 
0.25 uL (1-25 units) 1 
0.50 pL 

to 25 pL (9 pL if using 1.0 pL DNA) 



1 The success rate was lower when a smaller quantity was used, but the best DNAs work with >0.125 pL. 

4. Add 1-2 pL of template to each of the tubes. 

5. While the tubes/plate with template are on ice, add 24 pL of cocktail to each tube, being careful not to cross contaminate. Spin down to bring all liquid to the 
bottom of the tube. 

6. Run appropriate long PCR profile. Generic temperatures and times are: 

i. 93°C infinity (important to go directly from ice to hot block) 

ii. 93°C for 3 min (initial denaturation) 

iii. 93°C for 15 s 

iv. 48-68°C for 30 s (T, should be ~5°C below T m of primers) 

v. 68°C for 5-20 min (1 min/kb of target) 

vi. go to step 3, 34x 

vii. 4°C infinity 



7. Check reactions by running 2 pL on 1% agarose gel with appropriate size standards. 
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Primer combinations for long PCR amplification of the chloroplast genome.'- 2 



Region no. 


Approx. size (kb) 


Primers (F/R) 3 


Primer sequence (5'-3') 


1 


8 


trnH.CjUCj.oK 
psbK.195R 


CCTTRATCCACTTGGCTACAT 
ACTTACAGCAGCTTGC C AAAC 


2 


10.3 


f \ T TT T/~" CAT1 

trnQ.UUCj.50K 
rpoC2.4805F 


GGACGGAAGGATTCGAACC 
GYCGTATYGATTGGTTRAAAGG 


2a 4 


6.3 


trnUJ.UUu.5UK 
atpH.17F 


GGACGGAAGGATTCGAACC 
CTGCYGCTTCYGTTATTGCT 


ZD 


4 


atpr.65K 
rpoC2.4805F 


CGGTATTAAACCCGAAACTCC 
GYCGTATYGATTGGTTRAAAGG 


3 


7 


atpl. /U5K 
rpoC1.1670F 


CRGCTAAAGTTGCAAAAATAAGAGCT 
GRGATCAAATGGCTGTTCAT 


4 


9 


„/ti c ion 

ipoL.2.52UK 
petN.3R 


GTTCGTACAGCAGTATCYACAAC 
GCCCAAGCRAGACTTACTATATCC 


5 


10.5 


trnCCjCA.4/r 
psaB.2170F 


CCCAGTTCAAATCCGGGT 
GCRGCTTTCTTGATTGCYTC 


6 


10 


* -iTfc T /~1 ATT 1 1 H 

trntM.LAU.21R 
trnT.UGU.17F 


GGTTATGAGCCTTGCGAGCTA 
GGTTAGAGCATCGCATTTGTAATG 


7 


10.3 


rps4.380R 
rbcL.178R 


GGTTTGCARCGATAACTTGGKATATC 
GTCCATGTAC C AGT AGARGAT T C 


8 


9.2 


rbcL.2F 
psbJ.3F 


T GT C AC C AC AAAC AGARAC T AAAG 
GGCYGATACTACTGGAAGRAT 


9 


9.8 


petA.920F 
psbB.160R 


CTTCAAGAYCCATTACGTGTHCAAG 
TRCCYTGTCTCCACATTGGAT 


10 


10.9 


psbB.3F 
ips3.17F.new 


GGGTTTRCCTTGGTATCGTGT 
ATCCACTTGGTTTYMGACTTGG 


1 1 


8.7 


rpll6.3R 
ycf2.5100R 


AAC C AAC GAGT C AC AC AC T AAGC 
CAGATCAT G AAT G T T T G G AAT C C AT 


12 


10 


yct2.2300F 
ipsl2.190F 


TCGGGATCCTRATGCATATAGATAC 
GTTGCCAGAGTACGMTTAACCT 


13 5 


1 1 


ipsl2.360R 
ycfl.59R 


CCCTTGTTGACGATCCTTTACTC 
C C G AC C AC AAC G AC C G AAT 


14 


11.2 


trnN.GUU.7R 
ndhA.535F 


CCGCTCTACCACTGAGCTAC 
GCTGCTCAATCDATTAGTTATGAA 


14' 6 


7 


trnR.ACG.15F 
ccsA.890R 


GAGGATTAGAGCACGTGG 

T C C AAGT AAT AAANGC C C AAG T T T C 


15 


10.5 


ndhI.194R 
trnN.GUU.7R 


C GAACRC AT AC T T C AC AAG C AA 
CCGCTCTACCACTGAGCTAC 


16 


8.2 


psbA.640F 
ycf2.5100R 


GCTATGCATGGTTCYTTGGTAAC 
CAGATCATGAATGTTTGGAATCCAT 



1 Universal primers designed by M.J.M.; compiled and tested by D.C.T. and S.U.C. 

2 r, should be ~5°C below T m of primers; however, temperatures of 55°C have worked for all primer combinations. 

3 The name of each primer consists of three parts: (1) the gene in which the primer is anchored in, (2) the approximate position of the primer within that 
gene (based on all-angiosperm alignment per Moore et al., 2007), and (3) either an "F" or an "R." The F and R designations do not indicate that the primer 
should be used as a forward or reverse primer; rather, they indicate the 5' to 3' orientation of the primer with respect to the gene. In other words, a primer 
that is designated as an F primer has its 5' to 3' orientation in the same orientation as the gene (i.e., on the forward strand, or from start to stop), whereas 
an R primer is oriented in the direction opposite to the 5' to 3' orientation of the gene (i.e., on the reverse strand). 

4 Regions 2a and 2b can be used to amplify region 2 in two pieces. 

5 Regions 11, 12, and 13 represent a large portion of the inverted repeat (IR), thus, one amplification for both IRa and IRb. 
6 Region 14' amplifies ca. 2/3 of region 14. 
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